perm filename VIS[00,BGB]1 blob sn#069841 filedate 1973-11-09 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00013 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	2.0	Computer Vision Theory.
C00004 00003	2.1	Introduction to Computer Vision Theory.
C00011 00004	2.2	Related Work - State of the Art.
C00018 00005	2.3	Computer Vision Tasks.
C00022 00006	2.4	The Vision Cycle.
C00025 00007	
C00027 00008	
C00029 00009	2.5	The Nature of Images.
C00032 00010	2.6	The Nature of Worlds.
C00034 00011	2.7	Locus Solving.
C00036 00012	2.8	Grand Vision Theory.
C00039 00013	2.9	Summary of Arguments.
C00040 ENDMK
CāŠ—;
2.0	Computer Vision Theory.

	2.1	Introduction to Computer Vision Theory.
	2.2	Related Work - State of the Art.
	2.3	Computer Vision Tasks.
	2.4	The Vision Cycle.
	2.5	The Nature of Images.
	2.6	The Nature of Worlds.
	2.7	Locus Solving.
	2.8	Grand Vision Theory.
	2.9	Summary of Arguments.


2.1	Introduction to Computer Vision Theory.

	Vision  is the  act  or power  of  seeing.   Computer  vision
concerns programming a  computer to do a task that demands the use of
an image forming light sensor,  such as a television camera.   Stated
in one  sentence,  my  theory is that  normal vision is  a continuous
process  of  keeping  an  internal  visual  simulator  in  sync  with
perceived images of the external reality, for the sake of some goal.

	But to  start at a  logical beginning,   I wish  to postulate
the  existence of  the external physical  universe, the  existence of
images and  image  processing,    the existence  of  internal  mental
states,   and  the  existence  of visual  tasks  and  goals; as  they
commonly  relate  to  computer  technology.   Also  for  the  sake of
starting  the  discussion, vision  systems  can  be  described  as  a
mediating between perceived images and  a world model.  The two poles
(or operands) of the  system are called the  "bottom" for images  and
the  "top"  for  the  models.   The  "world  model"  operand  can  be
identified  even in vision  systems that do  not advertise  it.  Work
that truly lacks a  world model is not  computer vision,  usually  it
is image processing.  Given the  two classes of operands,  images and
worlds;  there are  three operations: recognition,   verification and
description; which a general vision system may perform.

	Verification vision is  also called top-down or  model-driven
vision.  The  verification  approach  involves predicting  an  image,
followed by comparing the predicted  image and a perceived image  for
slight  differences which  are  expected but  not  yet measure.
Recognition  vision and descriptive vision  are also called bottom-up
or data-driven vision. Recognition vision is qualitative, what  is in
the  picture   is  determined  by   extracting  a  set   of  features
(qualities)  and  by  classifing  them  according  to  a  essentially
statistical world  model. Description  vision  is quantitative.  Many
theories  are  superficially   different  in  that  they  consist  of
compounding the three basic  modes of vision,  or by using  different
forms of the two basic elements: image and model.

	In this  chapter,   several  kinds of  theory are  presented.
There is  general theory, which is my  interpretation of the state of
the art of  computer vision.   There  is the special  theory,   which
inspired this work  and lead to the particular design  choices I wish
to  elaborate and defend.  There  are alternate theories and designs,
which are  mentioned for  the sake  of contrast.   Finally,   I  will
conclude by  giving my  world view of  the ultimate nature  of visual
perception,  consciousness and intelligence.  The word "theory",   as
used here,  means simply a set  of statements presenting a systematic
view of a  subject. Specifically,  I wish to exclude the connotations
that the  theory  is  a  mathematical theory  or  a  natural  theory.
Perhaps there  can be such  a thing  as an "artificial  theory" which
extends  from the  philosophy  thru the  design of  an artifact.   An
artificial  theory  is   validated  by  the  successful   design  and
production of the intended artifact.

	In early 1942,  there were five  ideas on how  to manufacture
fissionable  material for  a bomb;  three uranium  isotope separation
techniques: electomagnetic,   centrifuge  and gaseous-diffusion;  and
two plutonium reactor techniques: graphite  and heavy water. In spite
of  the considerable power of  theory in nuclear  physics,  there was
no a priori way to  select the best method; so all the  theories were
tried, and three of the  methods were made to work by 1945. Although,
several different theories of  design may lead  to the same  ultimate
product; one theory is going to be the first to work, perhaps another
will work best, and perhaps yet a third will be the cheapest.
2.2	Related Work - State of the Art.

	In many  papers, Larry Roberts  is justly credited  for doing
the  seminal  work  in  Computer  Vision;  and  although  his  thesis
appeared over ten years ago  the subject has languished dependent  on
and subsumed  by the  four areas called:  Image Processing,   Pattern
Recognition,   Computer Graphics, and Artificial Intelligence. Thus I
will point out  the relevant  threads of computer  vision in each  of
these four subject areas.

(Computer vision and A.I.):
	At one  extreme, computer vision  may be discribed  as merely
the problem of  getting visual input hardware properly connected to a
computer; once the computer can "see" a raster of intensities  in its
memory,  the rest  of  the problem  is  artificial intelligence.  The
other extreme  is harder to depict because it requires figuring where
to draw the line between vision software and intelligence software;
one goal I wish to pursue in this chapter is demark such a line.

Normal vision,  as oppose  to visual  puzzles, is  not an  Artificial
Intelligence problem in  the sense that it does not involve conscious
cognition; verbal abstraction; symbolism or self programming.

"The history of progress in the development  of systems for automatic
symbolic   integration  poses  an  interesting   question  about  the
definition of artificial intelligence. Few would argue  that Slagle's
SAINT  program was  a  product of  artificial intelligence  research.
Moses'  SIN program for symbolic integration  seldom needed to resort
to search,  and for  this reason some  people consider  it much  more
powerful (intelligent ?) than  SAINT. Now, Risch (1969) has developed
an  algorithm  for  integrating  many  types  of  expressions.  Risch
considers himself  a  mathematician, not  an artificial  intelligence
researcher.  In your opinion  should Risch's  algorithm be considered
part of the subject matter of artificial intelligence ? If  you would
exclude Risch  from artifial intelligence,  how would you  respond to
the  statement  that  every  artificial  intelligence  program  might
eventually  be dominated  by  a  (more intelligent?)  non  artificial
intelligence algorithm?  If you would  include Risch, would  you also
include the long-division algorithm?"

			- Nils J. Nilsson, problem 4-5;
			Problem-Solving Methods in Artificial Intelligence.

(Fiegenbaum Quote).

	[the relation between Artificial Intellegence, experiment,
environmental simulation].

	"The design,  implementation, and use  of the  robot hardware
presents  some   difficult,  and  often  expensive,  engineering  and
maintenance problems. If  one is to  work in  this area solving  such
problems  is   a  necessary  prelude   but,  more  often   than  not,
unrewarding  because the activity  does not address  the questions of
A.I. reseach  that motivate  the project. Why,  then, build  devices?
Why not simulate  them and their environment? In  fact, the SRI group
has done  good work  in simulating  a  version of  their robot  in  a
simplified environment. The  answer given is  as follows. It  is felt
by  the  SRI  group  that  the  most  unsatisfactory  part  of  their
simulation effort was  the simulation of  the environment. Yet,  they
say that  90% of  the effort  of the simulation  team went  into this
part  of  the  simulation. It  turned  out to  be  very  difficult to
reproduce in an internal representation for a  computer the necessary
richness of environment that  would give rise to interesting behavior
by the highly  adaptive robt.  It is easier  and cheaper  to build  a
hardware robot  to extract what  information it  needs from the  real
world  than to organize  and store a  useful model.  Crudely put, the
SRI group's argument  is that the most  economic and efficient  store
of information about the real world is the real world itself."

					- E. A. Fiegenbaum [ref. X].
---------------------------------------------------------------------
The traditional  subject of image  processing involves the  study and
development  of  programs that  enhance,   transform  and  compare 2D
images.  Nearly all such  image processing work can be  subsumed into
computer vision.
2.3	Computer Vision Tasks.

2.3.1	Continuity Tasks: the Cart Task & the Turn Table Task.
2.3.2	Analysis of a Blocks picture.
2.3.3	Recognition Tasks.


	Seemingly, the visual tasks are selected by the sponsors
or the admistrators of research, rather than being considered
a serious research question in its one right.

(Vision tasks).

	The computer vision problem I wish to  consider is to write a
program  that  can see  and act  with  respect to  the  real physical
world.    The  interest  of  other  researchers  in   modeling  human
perception,      in   participating  in   traditional   philosophical
arguments,  in  solving  puzzle problems  or  in  developing advanced
automation techniques  must  constantly be  taken  into account  when
discussing computer vision.


	Vision task that emphasive the continuity of the
visual process: 

(cart task).

Given a computer controlled cart, explore and map the world.

	(Cart  Hardware  Discription).  The  cart   at  the  Stanford
Artificial  Intelligence Laboratory is intended  for outdoors use and
consists of  four bicycle  wheels,   a  piece of  plywood,   two  car
battiers,   a television camera,   a  television transmitter,   and a
toy  airplane radio  receiver.  (The  vehicle being  discussed is not
"Shakey",    which  belongs  to  the   Stanford  Reseach  Institute's
Artificial  Intelligence Group.   There  are two  "Stanford-ish" A.I.
Labs and each has a computer controlled vehicle.) Logically the  cart
has three motors  which can be commanded to  run in one or  the other
direction  under  computer control.    The six  possible  cart action
commands are: run forwards, run backwards, steer to the left,   steer
to the right, pan camera to the left,  pan camera to the right. 

	(turn table task). The turn table task in to construct
a 3-D model from a sequence of 2-D television images taken
of an object rotated on a turn table.

(blocks tasks).
	The classic block vision task, dating from
Roberts, consists of two parts: first convert a video image
into a line drawing; second, find a selection of
prototype blocks that account for the line drawing.

[single image vs. multiple images].
[perfect line drawing puzzles: Guzman & Waltz].
[imperfect line drawing analysis]

(Recognition tasks).
2.4	The Vision Cycle.
	
The structure of any computer vision  system can be analysed
as a mediator  between perceived images and a world model.

The two poles  of the vision  transducer are called the  "bottom" for
images  and  the  "top" for  models.    Although I  do  not  like the
vision-language analogy,   I  wish to  adopt the  top  and bottom  as
formal vision terminology, because it is concise and widely used.

Having established a top and a bottom, we can now introduce
those two jargon gems: top-down and bottom-up.

A  notion characteristic  of  my  approach  is the  observation  that
computer vision is  the inverse of computer graphics.  The problem of
computer graphics  is  to  synthesis images  from  three  dimensional
models;  the problem  of computer  vision is  to analyze  images into
three dimensional models.
	

The Vision Mandala.
	1. PREDICT	2D ā†’ 3D		synthesis	verification
	2. PERCEIVE	3D ā†’ 2D		analysis	revelation
	3. COMPARE			recognition

Three modes of operation on the vision cycle.

1. Revelation Vision - Data Driven Vision.
	(nearly pure bottom up vision).

2. Verification Vision - Model Driven Vision.
	(nearly pure top down vision).

3. Recognition Vision - feature classification.
	(bottom up random access into existing top).

   Vision.
	Heuristic Vision - guess and test.
	Accomodating Vision.
	(first bottom-up, next top-down, then verify and correct).

---------------------------------------------------------------------
In my special theory, the vision transducer is:

	1. Continuous rather than discrete.
	2. Exact rather than fuzzy; numeric rather than symbolic.
	3. Bidirectional rather than one way.

(Bidirectional).

	The vision transducer has three possible modes:
verification, revelation and recognition.

Depending on circumstances,  the vision transducer  should be able to
run  almost  entirely  top-down  (verification  vision) or  bottom-up
(revelation vision).  Verification vision is all that is  required in
a  well   know  and  consquently  predictible   environment;  whereas
revelation  vision is  required in  a brand  new or  rapidly changing
environment.

(recognition)

	Recognition   involves   comparing
perceived  data with predicted  data; such  recognition comparing can
be done on any  of the four  types of 2-D images  or the 3-D  models.
Arcane  recognition  techniques  can  be  avoided  by  improving  the
prediction and the analysis so that matchs are nearly obvious.
2.5	The Nature of Images.

	There are three  basic kinds of  information in a  2-D visual
image:  photometric,   geometric,   and  topological; also  there are
three kinds of  2-D images: raster,  contour,   and mosaic.
The traditional  subject of image  processing involves the  study and
development  of  programs that  enhance,   transform  and  compare 2D
images.  Nearly all such  image processing work can be  subsumed into
computer vision.

---------------------------------------------------------------------
Assumption:	The perceived images are low quality, black and white,
		digitized television images.

Alternatives:	1. High quality electronic imaging device.
		2. Film scanning system.
		3. Active 3-D imaging device.
		4. Non-light devices: sound, radar, neutrinoes, etc.

Discussion:

	The argument in favor of using low  quality, black and white,
television  images is  based on  poverty  rather than  principle. Low
quality television  is the  cheapest electronic  way  to perceive  an
image in real time.

	Although, a super intellectual entities  would have eyes that
could see the  whole electromagnetic spectrum from gamma radiation to
direct current as well  as "voices" that  could broadcast on any  and
all frequency; the video restriction
---------------------------------------------------------------------
	An image contains three basic kinds of data:
topological data, geometric data, and photometric data.

	The quality of the particular computer vision system
that one is condemned to use is a great influence one's
theoretical approach.

	size of image
	photometric accuracy, bits per pixel
	resolution
	speed of image taking
2.6	The Nature of Worlds.

	The rules about the  world that can be assumed a  priori by a
programmer   are  the  laws  of   physics;  programming  a  Newtonian
simulator of the mundane physical  world to a given approximation  is
difficult   but  more   fruitful  than   programming  an   Aristolean
simulator.

(Reality Simulation).
---------------------------------------------------------------------
Assumption:	The world model should be a 3-D geometric model.

Alternatives:	1. Image memory and 2-D models.
		2. Procedual Knowledge, e.g. Hewett & Winograd.
		3. Semantic knowledge, e.g. Wilkes.
		4. Formal Logic models, e.g McCarthy & Hayes.
		5. Statistical world model, e.g. Duda & Hart.
Arguments:
---------------------------------------------------------------------
Assumption:	Partial knowledge is represented by approxination.

Alternatives:	1. Tree of possibilties.
		2. Multi valued logic.
		3. Probablities.

Arguments:
---------------------------------------------------------------------
2.7	Locus Solving.
	1. Camera Locus Solving.
	2. Body Locus Solving.
		Silhouette Cone Intersection.
		Envelope bodies.
	3. Sun Locus Solving.
		(compute it, look at it, shine and shadows).

	The crux  of computer vision  is to deduce  information about
the  world  being  viewed  from images  of  that  world.   The  world
information  most  directly  relevant  to  vision  is   the  physical
location,   extent and  light scattering  properties of  solid opaque
objects;  the location,  orientation and  scales of the  cameras that
takes the pictures;  and the location and  nature of the lights  that
illuminate  the  world. Accordingly, three  important vision problems
are  camera solving, body solving,  and sun  solving.

The macroscopic  world doesn't change  very rapidly; between  any two
world states  there is an intermediate world  state.  Parallax is the
principal means of depth perception.  Parallax is the  alchemist that
converts 2-D images  into 3-D models. Revelation vision  is a process
of  comparing percieved images  taken in sequence  and constructing a
3-D model of the unanticipated objects.
2.8	Grand Vision Theory.

"For the purpose of  presenting my argument I must  first explain the
basic  premise of sorcery as  don Juan presented  it to me.   He said
that for a sorcerer, the world  of everyday life is not real, or  out
there, as we believe  it is. For a sorcerer, reality  or the world we
all  know, is  only a  discription. For  the sake of  validating this
premise don Juan  concentrated the best  of his efforts into  leading
me to a  genuine conviction that what I held in  mind as the world at
hand was merely a  description of the world;  a description that  had
been pounded into me from the moment I was born."

			- Carlos Castaneda. Journey to Ixtlan.

	The  larger context  of  a  vision  theory depends  on  ones'
opinion about  the nature of biological perception. It is my opinion
that mind  is  to  matter,    as computer  software  is  to  computer
hardware.  That  is mind is a  program that is running  in the brain.
Now  what software can account for  counsciousness, the inner private
life of the self  that burns in our heads  ? The so called  stream of
counsciousness  consists of  little voice(s)  talking,   fragments of
music playing, and most important there  is the flow of the here  and
now.   The here  and now is  the totality  of the particular  sights,
sounds,   smells,  and  so on that  are being played in  your head in
sync with  the respective sensory  stimuli.   So I  believe that  the
major computation being performed by  an intellectual entity in order
to stay counscious of its external world is a reality simulation.

{mimicry arguments: for and against}.

2.9	Summary of Arguments.

1. Preference for continuous rather than discrete vision.
2. Preference for the descriptive approach rather than the
   classification model.
3. Preference for working with real images rather than
   with puzzle images.